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l  A.  AbbIKAUI 

Our  research  asms  to  develop  computational  models  of  inductive  inference  m  higher-level  human  cognition,  specifically  the  ability  to  gcneralire  fi'om  sparse, 
noisy,  ambiguous  data,  and  to  btdd  more  human-hke  machine  learning  systems  Work  has  focused  on  two  areas  of  cognition  learning  about  categories  and 
their  properties,  and  learning  about  relational  structures  —  specifically,  systems  of  causal  and  social  relations.  We  have  developed  a  theory-based  Bayesian 
fi-amework  for  modeling  these  learning  tasks  as  statistical  inference  over  hierarchies  of  structured  knowledge  representations  Our  models  have  made  several 
contributions  First,  they  have  eicplamed  a  broad  range  of  phenomena  with  high  quantitative  accuracy,  using  a  minimum  of  fi-ee  parameters.  Second,  they  have 
provided  a  rational  fi-amework  for  explaining  how  and  why  everyday  induction  works,  m  terms  of  ^prozsnations  to  optimal  statistical  inference  m  natural 
environments  Third,  they  have  provided  tools  for  elucidating  people' s  m^hcit  dieories  about  the  structure  of  tfie  world  -  describing  the  form  and  content  of  the 
pnor  knowledge  that  guides  inductive  inference  and  explaining  how  it  may  be  acquired  fi-om  experience.  Finally,  our  models  have  led  to  improved  algorithms  for 
machine  learning  of  category  structures  and  their  property  distributions,  causal  networks,  and  the  structure  of  social  relations,  thus  bringing  artificial  intelligence 
systems  closer  to  the  capacities  of  human  intelligence  at  the  same  time  as  they  support  a  better  understanding  of  human  intelligence 
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Project  Summary 

Our  research  aims  to  develop  computational  models  of  inductive  inference  in  higher-level  human 
cognition.  Work  lias  focused  on  two  areas  of  cognition:  learning  about  categories  and  their  properties, 
and  learning  about  relational  structures  —  specifically,  systems  of  causal  and  social  relations.  TTie 
modeling  challenge  is  to  explain  how  people  are  able  to  make  strong  inductive  leaps  —  or  generalizations 
to  novel  unseen  cases  ~  that  go  far  beyond  the  sparse,  noisy,  ambiguous  data  they  observe  during 
learning.  Our  research  seeks  to  understand  the  computational  basis  of  these  everyday  inductive  leaps  —  to 
model  them  quantitatively,  to  explain  in  principled  rational  terms  how  they  can  be  as  successful  as  they 
are,  and  to  build  machines  with  the  same  kinds  of  common-sense  inductive  capacities. 

Traditional  accounts  of  induction  emphasize  either  the  power  of  statistical  learning,  or  the  importance  of 
strong  constraints  from  structured  domain  knowledge,  intuitive  theories  or  schemas.  Our  work  is  based  on 
the  premise  that  both  components  are  necessary  to  explain  the  nature,  use  and  acquisition  of  human 
knowledge.  We  have  developed  a  theory-based  Bayesian  framework  for  modeling  inductive  learning  and 
reasoning  as  statistical  inferences  over  hierarchies  of  structured  knowledge  representations.  This  theory- 
based  Bayesian  framework  is  primarily  and  traditionally  posed  at  Marfs  level  of  computational  theory. 
We  have  also  begun  to  explore  rational  algorithmic  or  process-level  accounts  of  human  cognition,  based 
on  sensible  ways  of  approximating  intractable  Bayesian  computations  on  sparse  data. 

Our  theory-based  Bayesian  models  have  made  several  contributions.  First,  they  have  explained  a  broad 
range  of  phenomena  with  high  quantitative  accuracy,  using  a  minimum  of  free  parameters.  Second,  they 
have  provided  a  rational  framework  for  explaining  how  and  why  everyday  induction  works,  in  terms  of 
approximations  to  optimal  statistical  inference  in  natural  environments.  Third,  they  have  provided  tools 
for  elucidating  people’s  implicit  theories  about  the  structure  of  the  world.  For  the  inductive  inference 
tasks  we  study,  people  are  able  to  make  strong  inferences  from  very  limited  data,  and  these  inferences 
must  depend  on  some  form  of  prior  knowledge.  Our  models  provide  ways  of  describing  that  prior 
knowledge  and  explaining  how  it  too  may  be  acquired  from  experience.  Finally,  our  models  offer  a  two- 
way  bridge  to  the  state  of  the  art  in  AI.  They  have  led  to  improved  algorithms  for  machine  learning  of 
category  structures  and  their  property  distributions,  causal  networks,  and  the  structure  of  social  relations, 
thus  bringing  artificial  intelligence  systems  closer  to  the  capacities  of  human  intelligence  at  the  same  time 
as  they  support  a  better  understanding  of  human  intelligence. 

Our  work  makes  several  contributions  to  Air  Force  research  goals.  By  better  characterizing  human 
learning  and  inference  in  computational  terms,  our  models  can  suggest  ways  to  improve  training  and 
agent  modeling  for  simulations.  By  developing  more  human-like  algorithms  for  machine  learning  and 
reasoning,  our  work  could  lead  to  computer  systems  that  can  replace,  supplement  or  extend  the  existing 
capacities  of  Air  Force  personnel.  Finally,  an  extra  payoff  comes  from  pursuing  these  two  goals  together. 
By  developing  belter  machine-learning  and  reasoning  systems  that  operate  on  the  same  principles  as 
human  cognition  does,  w'e  should  be  able  to  make  human-machine  interaction  and  collaboration  more 
valuable  and  efficient. 
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General  technical  approach 


Theory-based  Bayesian  models  of  induction  focus  on  three  critical  questions:  what  is  the  content  of 
people’s  intuitive  theories  of  the  world,  how  are  they  used  to  support  rapid  learning,  and  how  can  they 
themselves  be  learned?  The  learner  evaluates  hypotheses  h  about  some  aspect  of  the  world  --  the 
meaning  of  a  word,  the  extension  of  a  property  or  category,  or  the  presence  of  a  hidden  cause  -  given 
observed  data  d  and  subject  to  the  constraints  of  a  background  theory  T.  Hypotheses  are  scored  by 
computing  posterior  probabilities  via  Bayes'  rule: 


P{h\d,T)  = 


P{d\hJ)P{h\T) 

J^P{d\h',T)P{h'\T) 

h‘^Hj 


(1) 


The  likelihood  P{d\h,T)  measures  how  well  each  hypothesis  predicts  the  data,  while  the  prior  probability 
P(/j|7)  expresses  the  plausibility  of  the  hypothesis  given  the  learner's  background  knowledge.  Posterior 
probabilities  P{h\ic,T)  are  proportional  to  the  product  of  these  two  terms,  representing  the  learner’s  degree 
of  belief  in  each  hypothesis  given  both  the  constraints  of  the  background  theory  T  and  the  observed  data 
d.  Adopting  this  Bayesian  framework  is  just  the  starting  point  for  our  cognitive  models.  The  challenge 
comes  in  specifying  hypothesis  spaces  and  probability  distributions  that  support  Bayesian  inference  for  a 
given  task  and  domain.  In  theory-based  Bayesian  models,  the  domain  theory  plays  this  critical  role. 


More  formally,  the  domain  theory  T  generates  a  space  Ht  of  candidate  hypotheses,  such  as  all  possible 
meanings  for  a  word,  along  with  the  priors  PQi\T)  and  likelihoods  P{d\fi,T).  Prior  probabilities  and 
likelihoods  are  thus  not  simply  statistical  records  of  the  learner’s  previous  observations.  Rather,  they  are 
products  of  abstract  systems  of  knowledge  that  go  substantially  beyond  the  learner’s  direct  experience  of 
the  world,  and  can  take  qualitatively  different  forms  in  different  domains. 


It  is  often  crucial  to  distinguish  multiple  levels  of  knowledge  in  a  theory.  The  base  level  of  a  theory  is  a 
structured  probabilistic  model  that  defines  a  probability  distribution  over  possible  observables  —  entities, 
properties,  variables,  events.  This  model  is  typically  built  on  some  kind  of  graph  structure  capturing 
relations  between  observables,  such  as  a  taxonomic  hierarchy  or  a  causal  network,  together  with  a  set  of 
numerical  parameters.  The  graph  structure  determines  qualitative  aspects  of  the  probabilistic  model;  the 
numerical  parameters  determine  more  fine-grained  quantitative  details.  At  a  higher  level  of  knowledge 
are  abstract  principles  that  generate  the  class  of  structured  models  a  learner  may  consider,  such  as  the 
sf)ecification  that  a  given  domain  is  organized  taxonomically  or  causally.  Inference  at  all  levels  of  this 
theory  hierarchy  -  using  theories  to  infer  unobserved  aspects  of  the  data,  learning  structured  models  given 
the  abstract  domain  principles  of  a  theory,  and  learning  the  abstract  domain  principles  themselves  —  can 
be  carried  out  in  a  unified  and  tractable  way  with  hierarchical  Bayesian  models. 


Tlie  following  sections  describe  more  specific  theory-based  Bayesian  models  for  the  two  areas  of  focus  in 
this  project,  category  learning  and  learning  causal  and  social  relations.  Numbered  references  refer  to 
publications  cited  at  the  end  of  this  report. 


2  Learning  categories 

Much  of  human  knowledge  is  organized  into  categories,  which  summarize  the  properties  of  objects  and 
the  ways  in  which  we  act  towards  them.  For  e.xample,  in  a  military  setting  a  person  might  need  to 
discriminate  many  different  categories  of  vehicles,  knowing  the  capabilities  of  each,  and  know  how  to 
identify  whether  those  vehicles  are  likely  to  be  friends  or  foes.  How  are  categories  structured,  and  how 
are  they  learned?  We  have  developed  theory-based  Bayesian  models  to  answer  these  questions  using 


several  techniques  that  build  on  -  and  contribute  to  -  state-of-the-art  research  in  machine  learning, 
statistics,  and  artificial  intelligence.  Nonparametric  Bayesian  methods  allow  us  to  build  models  of 
categorization  whose  complexity  does  not  have  to  be  fixed  in  advance.  The  number  of  categories,  along 
with  the  statistical  features  of  each  category,  can  be  discovered  automatically  from  observations. 

Inference  can  be  performed  using  Monte  Carlo  methods  that  grow  the  effective  number  of  categories  just 
as  the  data  require,  naturally  embodying  a  version  of  Occam’s  razor  that  balances  representational 
simplicity  and  fit  to  the  data  in  inferring  the  appropriate  number  of  categories.  Hierarchical  Bayesian 
methods  allow  us  to  express  higher-level  abstract  constraints  on  the  learner’s  more  concrete  hypotheses  in 
forms  that  can  themselves  be  learned  from  data  Probabilistic  models  defined  over  rule-based 
representations  can  capture  concepts  whose  structure  is  defined  symbolically.  Priors  on  rule-based 
concepts  can  be  defined  using  probabilistic  grammars  -  again  embodying  a  natural  version  of  Occam’s 
razor  in  a  wxiy  that  reflects  the  intuitions  of  “minimum  description  length”  (MDL)  learning,  but  embedded 
in  a  probabilistic  framework  that  gives  a  principled  basis  for  people  or  machines  should  express 
uncertainty  about  category  membership. 

We  have  used  these  methods  separately  and  in  combination  to  model  a  number  of  central  phenomena  in 
human  categoiy  learning,  and  to  build  more  human-like  machine  learning  systems.  W'ith  Charles  Kemp, 
Amy  Perfors,  and  Mike  Frank,  we  have  built  hierarchical  nonparametric  models  for  how  children  learn  to 
learn  categories  and  word  meanings  (1 , 25),  and  syntactic  constructions  (8,  23).  These  models  not  only 
learn  new  concepts  from  few  examples,  but  also  learn  what  abstract  properties  “good”  concepts  have  in 
common  which  allows  them  to  learn  how  to  learn  new  concepts  more  quickly.  With  Mike  Frank  and 
Noah  Goodman,  we  have  built  hierarchical  Bayesian  models  for  learning  word  meanings  that  Jointly  infer 
lexical  concepts  and  speakers’  referential  intentions  in  particular  communicative  acts,  which  substantially 
improves  accuracy  of  learned  meanings  (5,  17). 

We  have  built  hierarchical  nonparametric  models  for  discovering  multiple  cross-cutting  vv'ays  to 
categorize  a  given  domain  (13)-  for  instance,  learning  that  animals  are  best  grouped  taxonomically  (into 
mammals,  fish,  birds,  reptiles,  etc.)  in  order  to  explain  their  anatomical  and  physiological  features,  but 
can  also  be  grouped  according  to  ecological  niches  (e.g.,  land,  air,  sea  predators  or  prey)  in  order  to 
explain  their  behavioral  features. 

With  Noah  Goodman,  Tom  Griffiths  and  Jacob  Feldman,  we  have  built  grammar-based  models  for 
learning  rule-based  concepts  (2,  30).  We  have  begun  to  explore  how  these  grammar-based  approaches 
can  be  applied  to  learning  compositional  aspects  of  semantics,  with  Steve  Piantadosi  (20),  and  to  learning 
structured  object  concepts,  with  Virginia  Savova  and  Frank  Jakel  (22,  26). 

With  Charles  Kemp,  we  have  built  hicrarcliical  Bayesian  models  on  top  of  grammar-based  representations 
for  learning  the  abstract  structural  form  of  a  domain  (3).  Different  grammars  can  capture  the  tree- 
structure  of  taxonomic  categories  in  biology,  the  linear  structure  of  political  identities  in  voting  systems, 
or  the  low-dimensional  spatial  structure  of  perceptual  categories  for  faces  or  colors.  With  Kemp  and  also 
Pat  Shafto  we  have  shown  how  probabilistic  models  with  appropriate  abstract  structural  forms  can  capture 
people’s  property  induction  judgments  in  a  range  of  domains  (4,  29). 

Finally,  building  on  the  work  above  and  also  inspired  by  work  of  Tom  Griffiths  and  colleagues  on 
nonparametric  models  of  human  concept  learning,  we  have  built  nonparametric  models  for  machine 
concept  learning  that  can  learn  more  complex  concepts  more  accurately  than  previous  Bayesian 
classification  systems  (15). 
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Learning  causal  and  social  relations 


Categories  are  fundamental  to  how  we  organize  our  knowledge  of  the  world,  but  they  do  not  exhaust  our 
knowledge.  People  also  deploy  much  richer  systems  of  knowledge  -  what  cognitive  scientists  have  called 
intuitive  theories.  To  explain  how  people  learn  and  use  intuitive  theories,  we  have  taken  the  technical 
methods  described  above  for  modeling  category  learning  and  extended  them  to  work  with  relational 
representations:  representations  using  predicate  logic  that  describe  how  entities  of  one  or  more  types 
relate  to  each  other.  Our  focus  has  been  on  systems  of  causal  relations  and  social  relations,  and  on  the 
abstract  knowledge  that  constrains  how  these  systems  can  be  induced  from  sparse  data. 

With  Tom  Griffiths  we  have  shown  how  to  define  Bayesian  models  over  predicate  logic  representations 
to  capture  intuitive  theories  of  simple  causal  systems,  and  the  constraints  these  place  on  learning  specific 
causal  relations  (6,  27,  28).  With  Charles  Kemp,  Noah  Goodman,  Yarden  Katz  and  Tomer  Ullman  we 
have  shown  how  to  learn  abstract  knowledge  of  causality  -  including  abstract  theories  of  specific  causal 
domains  as  w'cll  as  more  general  knowledge  of  how  causality  works  across  domains  (9,  10,  2 1 )  -  by 
combining  both  hierarchical  Bayes  and  nonparametric  Bayes  methods  with  logical  representations.  With 
Liz  Bonawitz  we  have  tested  a  simple  version  of  this  approach  in  experiments  with  pre-school  age 
children,  showing  that  they  can  learn  appropriate  laws  and  abstract  categories  for  the  theory  of  magnet 
poles  in  ways  that  mirror  the  historical  development  of  these  concepts  in  science  (12). 

With  Noah  Goodman  we  have  modeled  how  people  can  ground  abstract  causal  knowledge  in  perception, 
learning  how  to  carve  up  the  continuous  spatiotemporal  flux  of  perceptual  experience  into  discrete  events 
at  the  same  time  as  they  learn  how  these  events  are  related  causally  (16).  Specifically,  we  have  shown 
empirically  that  human  learning  of  perceptually  grounded  causal  models  is  best  explained,  as  in  our 
model,  in  terms  of  a  joint  Bayesian  inference  about  both  the  variable  structure  and  the  causal  structure 
relating  those  variables.  With  David  Wingate  and  Dan  Roy,  we  have  shown  how  the  same  ideas  can 
serve  as  the  basis  for  more  sophisticated  nonparametric  latent-event  models  of  dynamical  systems  (24). 

We  have  applied  analogous  methods  for  hierarchical  and  nonparametric  Bayesian  learning  over  predicate 
logic  representations  to  model  how  people  learn  a  wide  range  of  social  relations,  and  most  interestingly, 
how  they  can  learn  the  abstract  form  of  a  relational  system  that  supports  generalization  about  the  social 
relations  of  new  agents  from  minimal  observations  (7,  18,  19).  For  instance,  we  have  modeled  how 
people  can  learn  that  a  particular  social  system  follows  a  tree  structure,  or  a  ring  structure,  or  a  clique 
structure,  and  use  that  knowledge  to  infer  which  social  relations  a  new  actor  will  engage  in  after  seeing 
just  a  single  interaction  between  that  actor  and  one  other  person  in  the  network. 

Finally,  with  Dan  Roy,  we  have  developed  new  nonparametric  Bayesian  machine  learning  methods  for 
relational  data;  our  methods  infer  a  hierarchical  tree  structure  of  latent  groups  that  best  explains  a  whole 
set  of  social  relations  based  on  possibly  difTerent  tree-consistent  partitions  of  the  objects  for  each  relation 
(14).  Roy  and  colleagues  (chiefly  Yee  Whye  Teh)  have  extended  these  approaches  to  a  novel  class  of 
nonparametric  models  known  as  the  Mondrian  Process. 
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